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Abstract. 

We study a stochastic nonlocal PDE, arising in the context of modelling spatially 
distributed neural activity, which is capable of sustaining stationary and moving 
spatially- localized "activity bumps" . This system is known to undergo a pitchfork 
bifurcation in bump speed as a parameter (the strength of adaptation) is changed; 
yet increasing the noise intensity effectively slowed the motion of the bump. Here 
we revisit the system from the point of view of describing the high-dimensional 
stochastic dynamics in terms of the effective dynamics of a single scalar "coarse" 
variable. We show that such a reduced description in the form of an effective Langevin 
equation characterized by a double-well potential is quantitatively successful. The 
effective potential can be extracted using short, appropriately- initialized bursts of 
direct simulation. We demonstrate this approach in terms of (a) an experience-based 
"intelligent" choice of the coarse observable and (b) an observable obtained through 
data-mining direct simulation results, using a diffusion map approach. 
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-u(x,t) + / J(x-y)f[I + u(y,t)-a(x,t)]dy (1) 



1. Introduction 

Pattern formation at large spatial scales in the cortex is a field of much recent 
interest [5], [TQl [151 EH]- Neural dynamics are intrinsically noisy, so it is natural to 
study the effects of noise on models for such pattern formation. In this paper we 
revisit a noisy model for pattern formation in a neural system [27J but this time 
concentrating on describing its dynamics in a low-dimensional way. We will study the 
partial integrodifferential system 
du(x, t) 
dt ~~ 

i ^ = Au(x, t) - a(x, t) (2) 

on the domain — tt < x < it with periodic boundary conditions, originally analyzed by 
Laing and Longtin |27J as a simple model for pattern formation in a neural field. The 
variable u(x,t) describes the neural activity at position x and time t. The function J 
describes the spatial coupling within the network, with J(x — y) being the strength of 
coupling between neurons at position x and neurons at position y. The function f(I) 
describes the firing rate of a single neuron with input /. The quantities A, I and r 
are constant. The value of A determines the strength of adaptation, J is a background 
current applied to all neurons and r is the adaptation time constant. Equation (J2]) 
represents the effects of spike frequency adaptation. Without (j2J) systems of this form 
have been used to model orientation tuning in the visual system, working memory and 
the head direction system [T5l [5] . Variants including (J2J) have been subsequently explored 

HDnmsn. 

In our numerical implementation we spatially discretise the domain with M equally 
spaced points, resulting in the 2M ODEs 

dui In -s—^ 

~dt = ~ Ui+ Ml^ Ji ^^ I + Ui ~ ai ^ ( 3 ) 
i=i 

dch _ _ 

dt w 
for i = 1, . . . , M, where = J(2Tr\i — j\/M). As in Laing and Longtin [27] we set 
I = -0.1, r = 5 and use f (x) = (1 + tanh(10x))/2 and J(x) = 0.05 + 0.24cosx, with 
M = 100. We add noise to the system by adding a Gaussian white noise term to 
each of (|3J), with = and (6(t)Cj( s )) — 2r]UijS(t — s), where = if i ^ j and 1 

if i = j. Initially we set the parameters A = 0.17 and r] = 10~ 4 ; we will study the effect 
of varying them below. 

At the deterministic limit r] = 0, for these parameter values the system dynamics 
ultimately exhibit a single stable activity bump that moves around the (periodic) domain 
in one direction (determined by the initial condition), at a constant velocity. If rj ^ 0, 
the bump motion becomes stochastic, exhibiting occasional switches in direction. This 
effect, studied by Laing and Longtin [27], is illustrated in figure [lj they concentrated 
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Figure 1. A typical simulation of the system (j3])-(j4]). The profile of u is shown 
as a function of space and time (values indicated by shadebar). Parameters are 
M = 100, r\ = 10- 4 , A = O.f 7, r = 5. 

on the net distance travelled during a finite time interval. For small rj the bump 
moved in only one direction, so that the net distance travelled was quite far (effectively 
proportional to the time); for r\ greater than a certain value, switches in direction lead to 
a much smaller net distance travelled. This phenomenon was referred to as noise-induced 
stabilisation. 

Laing and Longtin [27] showed that for a coupling function of the form used here, 
the system (JTJ)— (J2J) underwent a pitchfork bifurcation in speed as A was increased. They 
then added Gaussian white noise to the normal form of a pitchfork bifurcation, and 
qualitatively reproduced the noise-induced stabilisation. However, the actual derivation 
of a "noisy normal form" from the original system with noise was not attempted. In 
this paper we will demonstrate numerically that the noisy discretised system (JHJ)- (J4j) 
undergoes what we will characterize as an effective pitchfork bifurcation in speed as 
A is increased, and investigate the effects of varying noise intensity. This computer- 
assisted bifurcation analysis will be performed assuming that the dynamics of the high- 
dimensional stochastic system ©-(JID can be effectively described by the dynamics of a 
single, scalar coarse-grained observable (variable). The evolution of this variable can be 
determined by running many short, appropriately-initialized simulations of (J3])- (SJ) , in 
the spirit of the "equation-free" framework [231 12S] . 

The value of our scalar variable V (defined below) can be interpreted as the position 
of a "particle" moving in a potential, subject to noise. Beyond the effective pitchfork 
bifurcation the potential is double- welled; before it, it is single- welled. Each of the two 
wells corresponds to persistent motion in one direction (left or right); in the case of a 
single well we have a stationary (on average) activity bump. Since the system is isotropic, 
we expect the double well to be symmetric. Although we cannot analytically derive the 
effective potential and noise in our hypothesized Langevin description for V, we will show 
how they can be estimated using appropriately initialized short bursts of simulation 
of ([3])- (J4j) . Once the effective potential is approximated, we can locate its maxima 
and minima (which take the place of unstable and stable fixed points for the particle, 
respectively) and estimate average transition times between minima (corresponding to 
bump direction switches). By determining how the potential changes as A and 7] are 
varied we can also numerically obtain a bifurcation diagram for its extrema, which 
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takes the place of the traditional steady-state bifurcation diagram for the deterministic 
problem; identifying transition states (potential maxima, corresponding to unstable 
steady states) would be extremely difficult if not impossible using long simulations 
alone. An interesting point is that we can quantify the relationship between the noise 
intensity in the original system, rj, and the effective noise intensity, D(V), to which the 
particle in the potential is subject. 



2. Results 



2.1. Reconstructing the potential 

We choose as our coarse scalar variable, V(t), the instantaneous difference in position 
between the peak of u(x, t) and the peak of a(x, t). This choice comes from observations 
of the system dynamics: practically all system profiles during a long simulation are 
characterized by a single peak in each of the variable fields. Furthermore, the difference 
between these peaks is clearly related to the instantaneous speed of the bump: the peak 
of a normally lags the peak of u relative to the direction of bump motion. One can 
clearly see in figure [2] that periods of bump motion to the left (right) are characterized 
by effectively constant negative (positive) V(t). Factoring out the small amplitude noisy 
oscillations riding on the bump waveform, we identify the position of the peak of u(x, t) 
as the location c u that satisfies 

M 

^ sin (xi - c u )ui(t) = (5) 

i=l 

where = 2iri/M. (Effectively, we are finding the amount by which a simple sine wave 
must be shifted so that its inner product with u(x,t) is zero.) Similarly, c a satisfies 

M 

22 sin ( x i - c a)ai(t) = (6) 
1=1 

so that V = c u — c a . Results for a typical simulation are shown in figure [2] and a typical 
probability distribution of V, for the same parameters, is shown in figure El 

To establish the connection between V and the effective potential 3>(V) we assume 
that V(t) satisfies an (unknown) Langevin equation. Then, the probability density 
P(V,t) satisfies a Fokker-Planck equation of the form 



dP(V, t) 



2 



p(y,t). (?) 



m 

The (unknown in closed form) quantities n(V) and D(V) are estimated from ensembles 
of brief simulations initialized at V: 

»(V) = hm Q (AV)/At 2D(V) = lim ([AV} 2 )/At (8) 

where AV = V(t + At) — V(t) and the average is taken over the realization ensemble. 
More accurate, maximum likelihood based estimation techniques [1] can also be used. 
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Figure 2. Simulation results for system (J3j- ((4j) . Top row: position of the peak in the 
u profile as a function of time; second and third rows: V(t) and $^ (defined in the 
text), respectively for the same time interval; bottom row: spatial profiles of u (cross) 
and a (filled circles) at times marked in top panel. Parameters arc the same as in 
figure [TJ 
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At stationarity we have P(V) oc exp [— /3$(V)] where $(V) is the effective potential, 
related to n{V) and D(V) by US [35] 

P$(V) = const - / ^Lfo + logD(V). (9) 

Data for the estimation of fJ,(Vo) and -D(Vb) for a given value of Vb can be obtained in 
two ways. Firstly, we find occurrences of this value in a long simulation; we then track 
V(t) over a subsequent time interval (we use 14 time units) for each of these occurrences, 
and then average the appropriate quantities. This approach has been used previously 
to estimate drift and diffusion terms in Langevin equations [Tj5]. Typical results of 
estimating /x(V) and D(V) using this approach are shown in figure HI Performing this 
for a grid of Vq values provides enough data for the numerical estimation of the integral 
in (jHJ); the results for the estimates in figure 0] are shown in figure [51 

In the absence of a database from a long enough (equilibrium) simulation, a a better 
- more economical — way to find fi(V) and D(V), and thus $(V), is to deliberately 
initialize the system with a given value of V$ and then track V(t) over a subsequent 
time interval (we again use 14 time units), repeating and averaging as above. The 
advantage of this approach, as compared with the previous one, is that it need only 
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Figure 4. Estimation of n(V) and D(V) by the statistics of V occurrences in a long 
simulation. Top: estimates of fJ,(V). Bottom: estimates of D(V). Ten runs of length 
200,000 time units were used. Parameters are A = 0.17, r\ = 10~ 4 . Note that the error 
bars are smallest for more probable values of V. 
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Figure 5. The effective potential, $>(V), estimated in different ways. Dashed line 
from assuming that P(V) oc exp [— /3$(V)]; points from assuming a Fokker-Planck 
equation and reconstructing fi(V) and D(V) bythe statistics of V occurrences in a 
long simulation, then evaluating the integral ©. Refer to figured! 

be performed for as many Vq values as necessary for a given accuracy in the integral 
evaluation. Furthermore, for each such Vq value we can reinitialize as many independent 
runs as necessary for accurate estimation at will, without requiring a simulation so long 
that rare values of Vo are revisited enough times. Results of the latter type of calculation 
are shown in figure [6j and figure [7| shows the reconstructed potential from the estimates 
in figure EJ The only significant difference in this case is that initializing the system 
with a particular value of V seems to result in a lower estimate of both D(V) and <&(V) 
when \V\ is large. Recall, however, that these regions are rarely visited in a simulation, 
and thus the discrepancy may be attributed to an insufficiently long database for the 
first approach. 

2.2. Bifurcations 

A traditional bifurcation diagram for the deterministic problem with respect to a 
parameter such as A would involve tracing branches of steady states and constant shape 
travelling waves (which can also be reduced to solutions of a fixed point problem) using 
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Figure 6. Estimation of (i(V) and D(V) by initializing the system at a particular 
value of V and then running for a short time. Top: estimates of fJ-(V). Bottom: 
estimates of D (V). For each value of V, 4000 short bursts were run. Parameters are 
A = 0.17,7/ = 10~ 4 . Note the more uniform errorbar sizes compared with those in 
figure [31 
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Figure 7. The effective potential, $(V), estimated in different ways. Dashed line from 
assuming that P(V) oc exp [— f3$>(V)], points from assuming a Fokker-Planck equation 
and estimating fi(V) and D(V) by initializing the system with a particular value of V 
and observing its short-term behaviour. Refer to figure [51 

pseudo-arclength and branch switching techniques (e.g. [2]). In the coarse-grained 
stochastic case, it is natural to trace instead the zeros of the drift fi(V) as the same 
parameter A is varied, using the same standard bifurcation codes. It is worth noting 
that for the case of state-dependent noise, the local maxima of the probability density 
do not exactly correspond to zeros of fi(V) and one needs instead to find fixed points of 
the effective potential given in (jUJ) (zeros of the right hand side of © differentiated with 
respect to V). For the scalar case, one can implement secant-type iterative methods 
to converge to the zeros of the appropriate function using function estimates only; for 
the multivariable case matrix-free iterative techniques like Newton-Krylov GMRES can 
be used (see [221 E3]). In the cases we study fJ>(V) is well approximated by a cubic 
function; we take advantage of this simplification by estimating fi(V) for just four 
values of V, uniquely defining this cubic whose zeros we can then easily find. Such 
a "noisy" bifurcation diagram is shown in figure [U where we vary A. Stability can 
be determined from the local slope, n'{V), at the fixed point, and is indicated in the 
figure. Parameters of a different nature, such as the noise intensity or colour, can also be 
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Figure 8. Zeros of fi(V) as a function of A, for r\ = 10~ 4 and M = 100. Filled circles 
indicate stable solutions; empty circles, unstable. Also shown is a quadratic curve (in 
V) that best fits the outer branches. For each estimation of fi(V), 10,000 short bursts 
were run. 

varied in this context. Figure [9] shows the noise-induced "effective pitchfork" bifurcation 
that occurs as rj is varied. The location of these effective bifurcations was verified by 
running long simulations in their vicinity (results not shown). Note that if we assume in 
advance that the potential &(V) is symmetric (as the underlying system is), the cubic 
function representing fi(V) would have no quadratic or constant terms, so that only two 
evaluations of //(V) would be needed to approximate the required cubic. 

2.3. Switching times 

One of the most important statistics of our problem is the average time between changes 
of direction, or, more generally, the statistics of these switching times for the bump. 
Based on our effective Fokker-Planck model and Kramers' theory, once we have /3<&(V) 
we can also estimate the average time between switches in direction [13] . as 



Coarse-grained dynamics of an activity bump in a neural field model 



12 



0.3 



0.2 



0.1 



> 



-0.1 



-0.2 



* • 



o o o 



• • . 



o 



10" 



10" 



Tl 



Figure 9. Zeros of fi(V) as a function of n, for A = 0.175 and N — 100. Filled circles 
indicate stable solutions; empty, unstable. For each estimation of n(V), 10,000 short 
bursts were run. 



where V m i n is the value of V at which $ has a minimum, D = [D(0) + D(V m i n )}/2 and 
A$ = $(0) - $(V min ). Measurements of D and $ for rj = 10" 4 and A = 0.17 give 
t 3 x 10 3 . A typical distribution of waiting times from a long simulation is shown in 
figure [TOj The mean for the data shown is 3.5 x 10 3 , in excellent agreement with our 
Kramers' approximation. 

Along the same lines, we can quantify how the average switching time, r, depends 
on parameters. Typical results are shown in figure [HJ Here, the effective potential is 
not estimated from a long simulation database, but rather "on demand" by initializing 
the system at particular values of V, performing short time simulation bursts, and 
processing their results as before. A clear advantage of this approach, as opposed to 
running the system for long enough to measure sufficient occurrences of switches, is that 
large values of r can be inferred from a reasonable number of short simulations. See 
figure [TH for example, where we can "measure" switching times greater than 10 6 . 

For a simulation of a given, fixed duration T, these curves can be used to infer the 
parameter values for which we expect to see direction changes. If r > T, we will typically 
see the bump move in one direction only, whereas if r < T we will see switching, leading 
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to a drop in the absolute value of the distance travelled during the time T. For example, 
from figure [IT] (b) we see that for simulations of length T = 10 3 , rj must be greater than 
approximately 1CT 3 to observe any switching of direction. 

2.4- White versus coloured noise 

In [27] the authors investigated the effects of adding coloured noise to (E}- ([!])■ They 
found that increasing the correlation time of the coloured noise made the bump less likely 
to move. This was rationalized in terms of "frozen" noise (i.e. spatial inhomogeneity) 
which was likely to "pin" the bump, preventing it from moving. Increasing the 
correlation time of the coloured noise was thought of as interpolating between Gaussian 
white noise (with delta function autocorrelation in time) to frozen, spatially structured, 
noise. Here we investigate the effect again, quantifying the influence of noise colour on 
the location of the underlying bifurcation. 

As in [27] we add noise to the system by adding a term r]i(t) to each of (J3j), with 
(r]i(t)) = and {{ViityVji 8 ))} = 2ez/ye~'*~ s '/ A , where = if i ^ j and 1 if i = j. The 
notation {• ■ ■} indicates averaging over the initial distribution of 77(0) values, taken from 
a Gaussian with zero mean and variance 2e. We keep e = 10 -4 and vary the correlation 
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Figure 11. (a) Switching time, r, plotted on a log scale, as a function of A. r\ = 10~ 4 . 
(b) Switching time, t, plotted on a log scale, as a function of r\ (also on a log scale). 
A = 0.175. For both panels, r was estimated from a calculation of which itself 
was computed by initializing V at appropriate values and then running to observe its 
short-term evolution. 
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Figure 12. Zeros of fi(V) as a function of A for coloured noise with correlation time 
A = 100 (solid line and asterisks) and A = 1 (dashed line and squares). The curves are 
quadratic functions in V fitted to the data points. 

time A. The results are shown in figure [12] where we have plotted zeros of /x(V) as a 
function of A for A = 1 and A = 100. The figure was computed using the short runs 
initialized at prescribed V values. The results are consistent with those found in [27J, 
where it was seen that coloured noise was more effective at slowing the bump than 
Gaussian white noise of the same power, and that the effect was stronger for longer 
correlation times. The results presented here show that this effect can be rationalized 
in terms of shifting a bifurcation point. Results for several other values of A indicate 
that as A is increased, the bifurcation moves to higher values of A (results not shown). 

3. Diffusion maps and the data-based detection of coarse observables 

The previous results relied on our ability to choose a scalar variable, V, whose value 
correlates with the state of the high- dimensional system (after brief initial transients 
have equilibrated). However, we are often faced with dynamical systems for which 
choosing such a variable is far from obvious. We then need a systematic procedure for 
determining, from the results of a simulation, a good low-dimensional representation 
of the system state. Such data-mining, so-called "manifold learning" techniques have 
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Figure 13. Schematic of datapoints (filled circles) in K 2 that lie along a curve (solid 
line). Euclidean distance is a good measure of the separation between points A, B, 
and C along the curve but is a poor measure of the separation of points D, E, and F 
along the curve. 

been a focus of intense research in recent years [H [9], [H]; we will use here the 
recently developed diffusion map approach [281 129]. A large data ensemble from direct 
simulations of our system can be represented as a cloud of points in a high- dimensional 
space (here, M 200 ); see the schematic in figure fT3l 

When two such points (activity profiles) are very close to each other (i.e. when 
their Euclidean distance is small, as in the case of points A, B and C in the schematic), 
we can consider this distance as representative of the "intrinsic similarity" of the two 
configurations - in some sense, of how easy it is for the system dynamics to cause a 
transition from one configuration to the other. When, however, this Euclidean distance 
is larger than some threshold (as in the case of points D, E and F in the schematic) 
the Euclidean distance stops being a good measure of the "intrinsic similarity" between 
configurations - the "effort to transition" from E to F, measured by the arclength 
between them, is clearly less than the effort to transition from E to D, even though 
the Euclidean distances DE and EF are similar. The main idea underpinning diffusion 
maps is to perform a random walk on a graph in which data points are vertices, and 
connection strengths between data points are given by a Gaussian kernel of the form 
K (x, y) = exp (- ||x - y|| 2 /a 2 ). 

When the points are farther away than a cutoff (controlled by the parameter a) 
the vertices are effectively disconnected; when they are very close, the strength of the 
connection is controlled by their (small) Euclidean distance. A random walk on such 
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a graph gives rise to a diffusion distance - starting from the same source point and 
diffusing on the graph for some time, we then look for equal density contours; points 
on such a contour are equally easy to access from the source point, and are therefore at 
equal diffusion distance from it, even though their Euclidean distances from the source 
may vary substantially The procedure also provides a set of transformed coordinates; 
the Euclidean distance in these new coordinates is a true measure of intrinsic datapoint 
similarity. The procedure can be thought of as a nonlinear generalization of Principal 
Component Analysis |21j . 

In our case, we start from a long simulation (of 30,000 time units) during which 
we sample the u(x),a(x) profile every 8 time units, giving N = 3,750 data points. 
(From now on, we use the same parameter values as those in figure [TJ) Using the 
procedure briefly outlined in the Appendix we process these data forming a Markov 
matrix whose leading eigenvectors provide useful "reduction coordinates" for our data - 
in particular, the second eigenvector <3> 2 of this matrix is our computer-assisted candidate 
coarse observable (the first eigenvector, corresponding to the eigenvalue 1, is trivial). 
Figure UM shows the simulation data projected on the first two "reduction coordinates" 
- the first two nontrivial eigenvectors $2 and $3 of the appropriate Markov matrix. The 
right panel of figure HH shows a three-dimensional diffusion map, in terms of the first 
three nontrivial reduction coordinates. The coordinates of the i datapoint in this map 
are ($2 > ^3 > ^a) • Simulation points in the diffusion map are coloured according to 
their associated value of the empirical coordinate V - it is clear that points are ordered 
along the curve by their values of this known reaction coordinate. This sorting of high- 
dimensional data vectors is obtained in an "automated" fashion by the diffusion map 
calculation. 

All data instances clearly collapse on a one-dimensional curve. Figure [15] indicates 
five representative data points (at values of $2 = —1-8, —1, 0, 1, and 1.8) on this curve. 
The insets, showing brief space-time plots initialized at the corresponding data points, 
clearly indicate that right-moving states reside on one end of this curve, left moving 
states on the opposite end, and transition states between the two reside in the interior 
of the curve. It is clear that the curve is one-to-one with the variable $2 (for every value 
of $2 there exists one point on the curve); this suggests that $2 would be a good scalar 
observable for our system. Figure [2] (third row) shows the evolution of this data-based 
observable during our system dynamics; it is clearly capable of capturing the direction 
switches in the system in a manner comparable with the "experience-based" coordinate 
V. Another possible scalar observable we considered is the arc-length distance along 
the curve connecting components of the top two eigenvectors. The variation in this 
observable is comparable to that of $ 2 and also accurately describes the direction of 
travel of the bump during the simulation (results not shown). Figure [16] plots the 
simulation data in terms of the two coarse coordinates: the experience-based V and the 
data-based $ 2 - At first sight, a one-to-one correspondence between the two observables 
is apparent, suggesting that our experience-based coordinate was indeed a good choice. 

The processing of the database from a long, equilibrium run simulation that gave 
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Figure 14. Left panel: Diffusion map plotting components of the top two significant 
eigenvectors of Markov matrix M (defined in Appendix) constructed from simulation 
data. Datapoints are shaded by (and appear ordered according to) the value of the 
empirical coordinate V(t) (defined in Section[T]). The inset shows the leading eigenvalue 
spectrum for the diffusion map. Right panel: 3D diffusion map plotting components 
of top three significant eigenvectors 



us an effective potential in terms of the observable V can now also be performed in 
terms of the computer-assisted observable $2; the resulting two- well effective potential 
and the corresponding effective Langevin description are practically isomorphic to the 
l/-based description. The effective potential computed in terms of the computer-assisted 
observable $2 is shown in figure [TTJ 

3.1. Computing with diffusion map coordinates 

When we attempt to directly construct the effective potential in terms of the diffusion 
map coordinate $2 (i-e., not from processing a long run database), it becomes necessary 
to initialize short simulation bursts consistent with particular values of $2 as well as 
observe the $2 values corresponding to results of detailed simulation. The diffusion map 
calculation that leads to the identification of the coordinate $2 automatically provides 
its value on each of the data points used in the calculation (see the Appendix). It 
therefore becomes important to establish computational protocols that routinely allow 
the translation between physical coordinates and diffusion map coordinates. This 
translation must be possible in both directions: to initialize a short system run we 
need to translate a value of $2 to system initial conditions (living in M 200 ); processing 
of the short simulation bursts involves extracting the $2 values of the resulting states. 
These translation operations are termed "lifting" and "restriction" in the equation-free 
framework [HI [23]. We now briefly discuss possible implementations of such protocols, 
starting with the restriction: obtaining the $2 values of new system configurations, not 
in the original database, that result from our short simulation bursts. 

For points in the original dataset (used to assemble the neighbourhood matrix K) 
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Figure 15. 2-dimensional diffusion map representation of long simulation data (as 
shown in figure If 4[) . The insets show representative local solution evolution for u 
profiles initialized at 5 different regions of the map ($2 = — 1.8,— f, 0,1, and 1.8). 
Shadebar shown at top of figure indicates u values in space-time plots. 



eigenvalues and eigenvectors of the symmetric kernel M s are related by 



Ms*, = \ s * s . 



For a particular point x, in the dataset we have 



A? 



(11) 



(12) 



k=l 



For a new datapoint ^ new ) the same formula should be valid. However, while the right 
hand side is unknown, the left hand side may be computed for the new datapoint, which 
gives the following formula for the extension of the diffusion map 

N 



(new) 
j 



r (new) 



(A:) 



(13) 



k=l 



where M s is a generalized kernel which extends to new sample points and is given by 



(new) 



,XfcJ 



(14) 



N ^E y (K(*™>),y)) E y (AT(x fc> y')) ' 

Equation ([TBI is called the Nystrom formula [3] and allows eigenvector components 
associated with new data vectors x^ new \ outside of the original sample, to be computed 
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Figure 16. Coordinate V(t) versus top significant eigenvector coordinate $^ f° r l° n g 
simulation results. 
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Figure 17. Effective potential /3<I> estimated in terms of diffusion map coordinate <I> 2 ■ 
Dashed line: from assuming that P($ 2 ) ^ ex P [ — points from assuming a Fokker- 
Planck equation, reconstructing //($ 2 *^) and £>(<!> 2 ^) and computing the integral ©. 



Coarse-grained dynamics of an activity bump in a neural field model 



21 



by eigenspace interpolation instead of repeated eigendecomposition of the Markov 
matrix augmented by the new data vectors. The expectations appearing in the 
denominator of ( |T4l) are computed from the empirical data using 

1 - 

£ y (^(x,y)) = -]T^(x,y,,). (15) 
i=l 

Use of the Nystrom formula (|T3|) is central to the efficient computation of diffusion 
map coordinates associated with a high- dimensional vector {restriction) and also the 
preparation of a data vector x {lifting) with desired diffusion map coordinate values. 
Equation ( fl~3l) is used to first compute the eigenvector components ^™ ew ^ (associated 
with the symmetric matrix M s ); diffusion map coordinates (associated with M) for the 
new data point may then be computed using 

$ (ne W ) = _J_ (16) 
J ^j(new) x ' 

The Nystrom formula allows calculation of the diffusion map eigenvector 
components <&"f ew associated with a new data vector ^i new )_ A full eigendecomposition is 
typically performed first for a representative subset of simulation datapoints, identifying 
the relevant eigenvectors. The Nystrom formula is then used to perform the restriction 
operation in (|T3l) which amounts to interpolation in diffusion map space. 

Our protocol for lifting from prescribed diffusion map coordinates to consistent 
system states using stochastic optimization is presented in figure [THJ The main step 
in this lifting procedure, which produces u{x) and a{x) solution profiles with specified 
diffusion map coordinates, is the minimization of a quadratic objective function given 
by 

Obj{*f) = XoBj^f - ®T 9 ? (17) 

where Xobj,2 is a weighting parameter that controls the shape of the objective away from 
its minimum at $2 = ■ The implicit dependence of the eigenvector components 
$j (associated with a data vector x^) on x*^ and all other data vectors used in 
the construction of the neighbourhood matrix K makes this optimization problem 
challenging. We used here, for simplicity, the method of Simulated Annealing (SA) 
[2U [32] to minimize the objective function defined in ffTTl) . and identify a data vector 
x ( tar- 9) w ith the target diffusion map coordinate <3> 2 ar9 . Table [1] compares drift and 
diffusion coefficient values (at $2 — —0.5) estimated by multiple short simulation bursts, 
initialized using this lifting protocol, with values computed from a long time simulation. 
The agreement is very good. 

Given these protocols, the entire effective potential and, more generally, all the 
bifurcation/switching time computations performed above with our empirical V coarse 
variable can be repeated with the $2 variable. "Coarse-grained smoothness" in the 
diffusion map coordinates can be exploited to guide the efficient exploration of the 
effective potential surface [T7] . 
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06/H' ) ) = V- 2 H ) -*r) 



Figure 18. Protocol for lifting from diffusion map coordinates using Simulated 
Annealing (SA). We show (top left) normalized u and a solution profiles which define 
the trial vector x^*" a ". This 200-dimensional vector is restricted, the "location" of 
the data vector in the diffusion map is identified (top right) and the corresponding 
objective value computed (bottom). The objective value guides the selection of a new 
search direction (in R 200 ), generating a new set of solution profiles (a new x('™ a ^) and 
the above procedure is repeated. 

Table 1. Drift and diffusion coefficients at diffusion map coordinate value $2 = ~0.5 
estimated by finding occurrences of $2 m a long simulation (database) and by direct 
initialization at this value (lifting). 



Method 


M$2 = $2) 


£($2 = $2) 


Database 


-3.51 x 1(T 3 


3.30 x 10~ 3 


Lifting 


-3.39 x 1(T 3 


3.20 x 10~ 3 



4. Discussion and Conclusion 

In this paper we have revisited a stochastic spatio-temporal pattern forming system 
originally used to model activity in the cortex. Previous work [27] investigated the 
effects of changing two parameters (spike frequency adaptation strength and noise 
intensity) on the dynamics of a spatially-localized "bump" of neural activity. Here, 
we have concentrated on deriving and using an effective low-dimensional description 
of the system in terms of a single scalar coarse variable. This variable was assumed 
to satisfy an (unknown) effective Langevin equation, so that its probability density 
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satisfies an effective Fokker-Planck equation. By appropriate processing of the results 
of a long simulation, or — more efficiently — by appropriately initializing short bursts 
of simulation of the full system, we estimated the drift and diffusion functions that 
appear in the Fokker-Planck equation, and thus determined an "effective potential" for 
the system on demand. In the second part of the paper we showed how a similar analysis 
can be performed using a variable that is extracted by data-mining a sufficiently long 
system simulation in an automated fashion, using the diffusion map approach [28, 29J. 
Such an approach could prove particularly useful for other systems, since in general a 
good coordinate (or coordinates) for a low-dimensional description of a complex system 
may not be known, nor easy to guess. The results obtained using this new variable 
showed good correspondence with those obtained in the first part of the paper. 

Considering more general systems, it is known that in certain limiting cases (e.g. 
weak coupling) it is possible to explicitly obtain accurate reduced descriptions (e.g. 
phase oscillator equations) for large coupled neural systems [TBI 120] • Clearly, if such 
equations can be derived, they are much easier to use than the approach presented 
here; our approach is intended for cases where we believe the reduction is possible, but 
cannot be explicitly performed. One of the most challenging tests for a reduction — as 
we move away from the conditions where we can guarantee its validity analytically — is 
to test, on line, whether it is accurate, and — importantly — whether a different level 
reduction, with more (or even possibly with fewer) variables is in order. In our case, this 
would correspond to devising tests to suggest, on line, that more than one coarse-grained 
variable must take part in our effective Langevin equation. The slight "thickness" of the 
line in figure [TBI could be an indication that, for the parameter values used, more than 
one variable may be necessary in our model. Devising such tests — in effect, testing 
the hypothesis that the data are locally consistent with a particular model, e.g. a 
scalar Langevin equation — is the subject of ongoing research across several disciplines 
(statistics, financial mathematics). Integrating such tools in a multiscale simulation 
framework is only just starting. 

Even though the model discussed here is rather abstract (from a computational 
neuroscience point of view) it adds to the body of work demonstrating that it is 
possible to obtain useful low- dimensional descriptions of complex systems, both in neural 
modelling [26] and elsewhere [131 E3, [35] . These descriptions not only provide insight 
into the fundamental dynamics, but enable one to simulate and analyse such systems in 
an efficient manner. 

Coarse-graining large scale, faithful neural network computations constitutes an 
important subject of intense current research (see, for example, the probability density 
approach [3 [251 [30]). Multiscale, coarse-graining numerical algorithms such as the one 
demonstrated here have an important role to play in elucidating the types of behavior 
possible for such models and their parametric dependence. 
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Appendix 

We run a simulation for 30,000 time units and sample the u(x),a(x) profile every 8 
time units, giving N = 3,750 data points. We have discretised space with M = 100 
points, so every data point is actually a vector of 200 values. The components of the 
simulation data vector x for our problem are determined by the 2M nodal values of 
the dependent variables and a, (appearing in (jSJ)-®) at a particular time step. We 
normalize each Ui profile by its maximum value (at node i ma x) and normalize the a, 
profile by its nodal value at the corresponding location (ai max ). Finally, we shift both 
solution profiles by a fixed number of nodes such that the alignment between Ui profiles 
is maximized (in a least squares sense) thereby factoring out translations. We define 
a pairwise similarity ("neighbourhood") matrix K between a representative sample of 
these data vectors (collected over the course of a simulation run) as follows 

2" 



K i :j = K (xj, Xj) = exp 



x» - x 



a 



(A.1) 



where a is a parameter that defines the size of the local neighbourhood surrounding each 
high-dimensional point. Defining the diagonal normalization matrix = Ylj-K-iji we 
construct the Markovian matrix M = D~ K. A few top eigenvectors of the matrix M 
are used here as a low dimensional representation of the simulation data. Components in 
the eigenvector provide the low dimensional coordinate for each simulation data vector. 
The diffusion map distance is equal to Euclidean distance in the diffusion map space. 
In many applications the spectrum of the matrix M possesses a spectral gap, and this 
diffusion distance may be approximated by using only a few top eigenvectors. This 
approximation has been shown to be optimal under a particular mean squared error 
criterion [28] . 

The matrix M is adjoint to the symmetric matrix M s defined as follows 

M s = D 1/2 MD~ 1/2 = D 1/2 KD 1/2 . (A.2) 

M and M s share the same eigenvalues; eigenvectors $j of M are related to those of 
M s , denoted as follows 

<&, = D- 1 / 2 *, (A.3) 

We compute the largest eigenvalues and eigenvectors of the symmetric matrix M s 
and use flA.31) to evaluate the eigenvectors of the Markov matrix M. We note that 
the top eigenvector of M s , corresponding to the largest eigenvalue Ai = 1, has 
components equal to the diagonal entries of D 1//2 ; it follows from (1A.3j) that the top, 
trivial, eigenvector 3>i of M consists entirely of ones. 
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